Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support snapshot in rbdplugin #61

Merged
merged 7 commits into from
Aug 9, 2018

Conversation

sngchlko
Copy link
Contributor

@sngchlko sngchlko commented Aug 8, 2018

Snapshot is one of the added features in CSI 3.0 spec.
This PR implements functions about the snapshot feature in rbdplugin.

What has been added:

  • CreateSnapshot:
    • creates a snapshot with rbd snap create
    • the snapshot is protected with rbd snap protect for cloning
  • DeleteSnapshot:
    • the snapshot is unprotected with rbd snap unprotect
    • remove snapshot with rbd snap rm
  • ListSnapshots:
    • get all snapshots from the cache.
    • if requested source volume id exists, get all snapshots which have same source volume id
    • if requested snapshot id exists, get a specific snapshot
  • Store snapshot information into the persistent file
    • use the same method like storing volume information but store in the different directory

What has been changed:

  • CreateVolume:
    • if this request is for restoring snapshot called by snapshot controller, creates a volume from the snapshot with rbd clone

@sngchlko sngchlko force-pushed the support-snapshot-in-rbdplugin branch from 941119c to 05ecabf Compare August 8, 2018 07:52
@sngchlko sngchlko force-pushed the support-snapshot-in-rbdplugin branch from 05ecabf to b1ccdbb Compare August 8, 2018 08:16
@@ -212,3 +212,11 @@ func (ns *nodeServer) NodeUnstageVolume(
*csi.NodeUnstageVolumeResponse, error) {
return nil, status.Error(codes.Unimplemented, "")
}

func (ns *nodeServer) NodeGetInfo(
Copy link
Contributor

@gman0 gman0 Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could do without overriding NodeGetInfo since it's already in csicommon.DefaultNodeServer. See https://github.com/kubernetes-csi/drivers/blob/master/pkg/csi-common/nodeserver-default.go#L47

And it also sets NodeId.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will remove this.

}

snapshotID := snapshot.GetId()
if snapshotID == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: len(snapshotID) == 0, consistent with the rest of the tests

}
rbdSnap.VolName = rbdVolume.VolName
rbdSnap.SnapName = snapName
snapshotID := "csi-rbd-snapshot-" + uniqueID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add image name

rbdSnap.SizeBytes = rbdVolume.VolSize

err = createSnapshot(rbdSnap, req.GetCreateSnapshotSecrets())
// if we already have the snapshot, return the snapshot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason the snapshot is created more than once? the unique id should prevent it...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is to make sure if plugin gets the create snapshot request for the same snapshot more than once, it won't create it again. This is to address the idempotent requirement of CSI spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. CSI Spec explained the details.
This operation MUST be idempotent. If a snapshot corresponding to the specified snapshot name is already successfully cut and uploaded (if upload is part of the process) and is compatible with the specified source_volume_id and parameters in the CreateSnapshotRequest, the Plugin MUST reply 0 OK with the corresponding CreateSnapshotResponse.

rbdSnap.CreatedAt = time.Now().UnixNano()

// Storing snapInfo into a persistent file.
if err := persistSnapInfo(snapshotID, path.Join(PluginFolder, "controller-snap"), rbdSnap); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also see
#62 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree using local storage is the problem.
I think we need external storage to share volume or snapshot information between rbd-plugins.
How about using etcd to store information?
cc: @gman0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sngchlko @rootfs let's discuss this issue after this is done here #66

}

snapshotID := req.GetSnapshotId()
if snapshotID == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: len(snapshotID) == 0

@@ -391,3 +461,132 @@ func getRBDVolumeByName(volName string) (rbdVolume, error) {
}
return rbdVolume{}, fmt.Errorf("volume name %s does not exit in the volumes list", volName)
}

func getRBDSnapshotByName(snapName string) (rbdSnapshot, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (*rbdSnapshot, error)

return err
}
glog.V(4).Infof("rbd: snap protect %s using mon %s, pool %s id %s key %s", image, pOpts.Monitors, pOpts.Pool, RBDUserID, key)
args := []string{"snap", "protect", "--pool", pOpts.Pool, "--snap", snapID, image, "--id", RBDUserID, "-m", mon, "--key=" + key}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is RBDUserID sufficient to create the snapshot or do we need admin id for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, i see, it is a hardcoded admin. This needs to change, eventually.

return err
}
glog.V(4).Infof("rbd: snap create %s using mon %s, pool %s id %s key %s", image, pOpts.Monitors, pOpts.Pool, RBDUserID, key)
args := []string{"snap", "create", "--pool", pOpts.Pool, "--snap", snapID, image, "--id", RBDUserID, "-m", mon, "--key=" + key}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// Since "controller" folder exists, it means the rbdplugin has already been running, it means
// there might be some volumes left, they must be re-inserted into rbdVolumes map
loadExVolumes()
for _, f := range files {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold a lock here

@@ -376,6 +402,50 @@ func deleteVolInfo(image string, persistentStoragePath string) error {
return nil
}

func persistSnapInfo(snapshot string, persistentStoragePath string, snapInfo *rbdSnapshot) error {
file := path.Join(persistentStoragePath, snapshot+".json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold a lock here

}

func deleteSnapInfo(snapshot string, persistentStoragePath string) error {
file := path.Join(persistentStoragePath, snapshot+".json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold a lock

// Storing snapInfo into a persistent file.
if err := persistSnapInfo(snapshotID, path.Join(PluginFolder, "controller-snap"), rbdSnap); err != nil {
glog.Warningf("rbd: failed to store sanpInfo with error: %v", err)
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete snapshot

@rootfs
Copy link
Member

rootfs commented Aug 8, 2018

@sngchlko thanks for getting this done here

@xing-yang can you take a look? thanks

@rootfs rootfs closed this Aug 8, 2018
@rootfs rootfs reopened this Aug 8, 2018
@rootfs
Copy link
Member

rootfs commented Aug 8, 2018

@sngchlko just to confirm, does rbd snapshot and restore work on your krbd test without any special rbd image features?

@xing-yang
Copy link

It will be great if you can test this driver with the snapshot controller PR: kubernetes-csi/external-snapshotter#7

@sngchlko
Copy link
Contributor Author

sngchlko commented Aug 9, 2018

@rootfs I had to use layering feature to create the new volume from the snapshot. I think it will be great to check image features at the time of creating the snapshot.

@xing-yang External-snapshotter sends CreateSnapshotSecrets as nil at the time of creating the snapshot. Actually, I tested with my own implemented csi-snapshot-contoller based on snapshot-controller in kubernetes-incubator. I hope to test on external-snapshotter soon.

@rootfs rootfs merged commit 43b9f9a into ceph:master Aug 9, 2018
@rootfs
Copy link
Member

rootfs commented Aug 9, 2018

@sngchlko thanks!

@xing-yang
Copy link

@sngchlko We need to fix the "CreateSnapshotSecrets as nil" issue in the External-snapshotter PR. Thanks for your feedback! Please let us know if you see other issues in that PR.

@gman0 gman0 mentioned this pull request Aug 9, 2018
}
} else {
glog.V(4).Infof("create snapshot %s", snapName)
err = protectSnapshot(rbdSnap, rbdSnap.AdminId, req.GetCreateSnapshotSecrets())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late comment, but why are you protecting the snapshot? Snapshot protection was only required to support RBD cloning, but with Mimic and its new clone v2 support, snapshot protection is on its way to being deprecated. Since this is all new functionality, it would be great if we could start w/ a clean implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dillaman

@sngchlko do you have a chance to consider this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Thanks for the review. Good point!!
But I think only supporting snapshot with Mimic is not good.
In my case, I have to use the luminous version as I installed the ceph with a rook project which supports only luminous now.
And people who want to use snapshot feature in CSI must upgrade to Mimic.
How about using protect operation according to version?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sngchlko Like I said, the "protect" operation is only relevant for cloning images (using older versions of Ceph). Creating a snapshot only requires the "rbd snap create" command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman You are right. But I think we need protecting snapshot for creating a new image from the snapshot. CSI spec seems to require creating new volume at the time of restoring the snapshot. AFAIK, cloning the snapshot requires protecting the snapshot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sngchlko I'm not familiar w/ the CSI spec, can you provide a link? I see [1] but it states revert isn't part of it. I would really like to avoid going down the tangled code of the OpenStack world due to how snapshots and clones needed to be handled before, which is why we added clone v2. Since this is all new functionality, it would be great if we could start off on a clean footing and avoid having to handle things like renaming protected snapshots "XYZ_deleteme" and images as "XYZ_deleteme" if they have a linked clone.

[1] https://docs.google.com/document/d/1oXVuDTNhXerr1UJ48tQrxF9J6vbm2LHVVC3z28655Qo/edit#heading=h.aqok97b5hs7t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman Sure, you can find VolumeContentSource for restoring the snapshot in CSI spec. If VolumeContentSource has something, it creates the new volume from the snapshot.
I think I can't determine about supporting snapshot only Mimic.
@rootfs, @gman0 What do you think about this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sngchlko Thanks. If the only use-case for snapshots right now is to restore and if restore requires creating new volumes (instead of rolling back), perhaps the short-term (pre-Mimic) thing to do would be to protect the snapshot upon create from snapshot, flatten the new volume, and unprotect the snapshot. Mimic and later clients can just directly clone the snapshot into a new image. --- or just only support creating images from a snapshot if you are using Mimic or later and enjoy the simpler CSI code that doesn't need to manage snapshot lifecycles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillaman I fully understand your concerns. I created #70. Thanks!

@@ -376,18 +432,191 @@ func deleteVolInfo(image string, persistentStoragePath string) error {
return nil
}

func getRBDVolumeByID(volumeID string) (rbdVolume, error) {
func persistSnapInfo(snapshot string, persistentStoragePath string, snapInfo *rbdSnapshot) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "rbd image-meta get/set" to associate the snapshot data directly within the RBD image instead of storing as a local file

wilmardo pushed a commit to wilmardo/ceph-csi that referenced this pull request Jul 29, 2019
nixpanic pushed a commit to nixpanic/ceph-csi that referenced this pull request Jul 6, 2021
BUG 1974816: util: allow any KMS connection to be configured by ConfigMap
Madhu-1 pushed a commit to Madhu-1/ceph-csi that referenced this pull request Dec 23, 2021
BUG 2030742: rbd: reset dummy image id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants